Joint probability distribution

In the study of probability, given two random variables X and Y that are defined on the same probability space, the joint distribution for X and Y defines the probability of events defined in terms of both X and Y. In the case of only two random variables, this is called a bivariate distribution, but the concept generalizes to any number of random variables, giving a multivariate distribution. the equation for joint probability is different for both dependent and independent events.

1 Example
2 Cumulative distribution
3 Discrete case
4 Continuous case
5 Mixed case
6 General multidimensional distributions
7 Joint distribution for independent variables
8 Joint Distribution for conditionally independent variables
9 See also
10 External links

Example

Consider the roll of a die and let $A=1$ if the number is even (i.e. 2,4, or 6) and $A=0$ otherwise. Furthermore, let $B=1$ if the number is prime (i.e. 2,3, or 5) and $B=0$ otherwise. Then, the joint distribution of $A$ and $B$ is

$\mathrm{P}(A=0,B=0)=P\{1\}=\frac{1}{6},\; \mathrm{P}(A=1,B=0)=P\{4,6\}=\frac{2}{6}$

$\mathrm{P}(A=0,B=1)=P\{3,5\}=\frac{2}{6},\; \mathrm{P}(A=1,B=1)=P\{2\}=\frac{1}{6}$

Cumulative distribution

The cumulative distribution function for a pair of random variables is defined in terms of their joint probability distribution;

$F(x,y)=P(X \le x, Y \le y) .$

Discrete case

The joint probability mass function of two discrete random variables is equal to

$\begin{align} \mathrm{P}(X=x\ \mathrm{and}\ Y=y) & {} = \mathrm{P}(Y=y \mid X=x) \cdot \mathrm{P}(X=x) \\ & {} = \mathrm{P}(X=x \mid Y=y) \cdot \mathrm{P}(Y=y). \end{align}$

In general, the joint probability distribution of $n$ discrete random variables $X_1,...,X_n$ is equal to

$\begin{align} \mathrm{P}(X_1=x_1,\dots,X_n=x_n) = \; & \mathrm{P}(X_1=x_1)\cdot \\ & {} \mathrm{P}(X_2=x_2|X_1=x_1)\cdot \\ & \mathrm{P}(X_3=x_3|X_1=x_1,X_2=x_2) \cdot \\ & ... \\ & P(X_n=x_n|X_1=x_1,\dots,X_{n-1}=x_{n-1}) \end{align}$

This identity is known as the chain rule of probability.

Since these are probabilities, we have

$\sum_x \sum_y \mathrm{P}(X=x\ \mathrm{and}\ Y=y) = 1.\;$

Continuous case

Similarly for continuous random variables, the joint probability density function can be written as f_X,Y(x, y) and this is

$f_{X,Y}(x,y) = f_{Y|X}(y|x)f_X(x) = f_{X|Y}(x|y)f_Y(y)\;$

where f_Y|X(y|x) and f_X|Y(x|y) give the conditional distributions of Y given X = x and of X given Y = y respectively, and f_X(x) and f_Y(y) give the marginal distributions for X and Y respectively.

Again, since these are probability distributions, one has

$\int_x \int_y f_{X,Y}(x,y) \; dy \; dx= 1.$

Mixed case

In some situations X is continuous but Y is discrete. For example, in a logistic regression, one may wish to predict the probability of a binary outcome Y conditional on the value of a continuously-distributed X. In this case, (X, Y) has neither a probability density function nor a probability mass function in the sense of the terms given above. On the other hand, a "mixed joint density" can be defined in either of two ways:

$\begin{align} f_{X,Y}(x,y) &= f_{X|Y}(x|y)\mathrm{P}(Y=y)\\ &= \mathrm{P}(Y=y \mid X=x) f_X(x) \end{align}$

Formally, f_X,Y(x, y) is the probability density function of (X, Y) with respect to the product measure on the respective supports of X and Y. Either of these two decompositions can then be used to recover the joint cumulative distribution function:

$\begin{align} F_{X,Y}(x,y)&=\sum\limits_{t\le y}\int_{s=-\infty}^x f_{X,Y}(s,t)\;ds \end{align}$

The definition generalizes to a mixture of arbitrary numbers of discrete and continuous random variables.

General multidimensional distributions

The cumulative distribution function for a vector of random variables is defined in terms of their joint probability distribution;

$F(x_1,\dots,x_n)=P(X_1 \le x_1,\dots, X_n \le x_n) .$

The joint distribution for two random variables can be extended to many random variables X₁, ... X_n by adding them sequentially with the identity

$\begin{align} f_{X_1, \ldots X_n}(x_1, \ldots x_n) =& f_{X_n | X_1, \ldots X_{n-1}}( x_n | x_1, \ldots x_{n-1}) f_{X_1, \ldots X_{n-1}}( x_1, \ldots x_{n-1} )\\ =& f_{X_1} (x_1) \\ & \cdot f_{X_2|X_1} (x_2|x_1)\\ & \cdot \dots \\ & \cdot f_{X_{n-1}| X_1 \ldots X_{n-2}}(x_{n-1}| x_1, \ldots x_{n-2} ) \\ & \cdot f_{X_n | X_1, \ldots X_{n-1}}( x_n | x_1, \ldots x_{n-1}),\end{align}$

where

$\begin{align} f_{X_i| X_1, \ldots X_{i-1}}(x_i | x_1, \ldots x_{i-1})= &\frac{f_{X_1, \dots X_i}(x_1,\dots x_i)}{\int f_{X_1, \dots X_i}(x_1,\dots x_{i-1},u_i) \mathrm{d} u_i}\\ = &\frac{\int \dots \int f_{X_1, \dots X_n}(x_1,\dots x_i,u_{i%2B1}, \dots u_n) \mathrm{d} u_{i%2B1}\dots \mathrm{d}u_n}{\int \dots \int \int f_{X_1, \dots X_n}(x_1,\dots x_{i-1},u_i, \dots u_n) \mathrm{d} u_i \,\mathrm{d} u_{i%2B1}\dots \mathrm{d}u_n} \end{align}$

and

$f_{X_1,\dots X_i}(x_1,\dots x_i) = \int \dots \int f_{X_1,\dots X_n}(x_1,\dots x_i,x_{i%2B1},\dots x_n) \mathrm{d} x_{i%2B1} \dots \mathrm{d} x_n$

(notice, that these latter identities can be useful to generate a random variable $(X_1, \dots X_n)$ with given distribution function $f(x_1,\dots x_n)$ ); the density of the marginal distribution is

$f_{X_i}(x_i) = \int \dots \int \int \dots \int f_{X_1,\dots X_n}(x_1,\dots x_{i-1},x_i,x_{i%2B1},\dots x_n) \mathrm{d} x_1\dots \mathrm{d}x_{i-1} \, \mathrm{d}x_{i%2B1} \dots \mathrm{d}x_n.$

The joint cumulative distribution function is

$F_{X_1,\dots X_n}\left( x_1, \dots x_n\right)= \int_{-\infty}^{x_1} \dots \int_{-\infty}^{x_n} f_{X_1,\dots X_n}\left(u_1,\dots u_n\right) \mathrm{d} u_1 \dots \mathrm{d}u_n,$

and the conditional distribution function is accordingly

$\begin{align} F_{X_i| X_1, \ldots X_{i-1}}(x_i| x_1, \ldots x_{i-1})= &\frac{\int_{-\infty}^{x_i}f_{X_1, \dots X_i}(x_1,\dots x_{i-1},u_i)\mathrm{d}u_i}{\int_{-\infty}^\infty f_{X_1, \dots X_i}(x_1,\dots x_{i-1},u_i) \mathrm{d} u_i}\\ = &\frac{\int_{-\infty}^\infty \dots \int_{-\infty}^\infty \int_{-\infty}^{x_i} f_{X_1, \dots X_n}(x_1,\dots x_{i-1},u_i, \dots u_n) \mathrm{d} u_i\dots \mathrm{d}u_n}{\int_{-\infty}^\infty \dots \int_{-\infty}^\infty \int_{-\infty}^\infty f_{X_1, \dots X_n}(x_1,\dots x_{i-1},u_i,\dots u_n) \mathrm{d} u_i \dots \mathrm{d} u_n}. \end{align}$

Expectation reads

$\mathbb{E}\left[h(X_1,\dots X_n) \right]=\int_{-\infty}^\infty \dots \int_{-\infty}^\infty h(x_1,\dots x_n) f_{X_1,\dots X_n}(x_1,\dots x_n) \mathrm{d} x_1 \dots \mathrm{d} x_n;$

suppose that h is smooth enough and $h(u_1,\dots u_n)=h(x_1,\dots x_n)$ for $u_1 \ge x_1, \dots u_n\ge x_n$ , then, by iterated integration by parts,

$\begin{align}\mathbb{E}\left[h(X_1,\dots X_n) \right]=& h(x_1,\dots x_n)%2B \\ & (-1)^n \int_{-\infty}^{x_1} \dots \int_{-\infty}^{x_n} F_{X_1,\dots X_n}(u_1,\dots u_n) \frac{\partial^n}{\partial x_1 \dots \partial x_n} h(u_1,\dots u_n) \mathrm{d} u_1 \dots \mathrm{d} u_n.\end{align}$

Joint distribution for independent variables

If for discrete random variables $\ P(X = x \ \mbox{and} \ Y = y ) = P( X = x) \cdot P( Y = y)$ for all x and y, or for absolutely continuous random variables $\ f_{X,Y}(x,y) = f_X(x) \cdot f_Y(y)$ for all x and y, then X and Y are said to be independent.

Joint Distribution for conditionally independent variables

If a subset $A$ of the variables $X_1,\cdots,X_n$ is conditionally independent given another subset $B$ of these variables, then the joint distribution $\mathrm{P}(X_1,...,X_n)$ is equal to $P(B)\cdot P(A|B)$ . Therefore, it can be efficiently represented by the lower-dimensional probability distributions $P(B)$ and $P(A|B)$ . Such conditional independence relations can be represented with a Bayesian network.